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This is a report of the findings and recommendations 
of the Division of Instruction and Professional Development of the 
National Education Association (NEA) on testing. NEA called for a 
moratorium on standardized testing in 1972 and created the task force 
on testing^ whose work is summarized in this report. After an 
introduction stating the problem, the document presents NEA 
resolutions and new business items on testing. Included in this are 
statements of task force beliefs, some of which are as follows a) 
some measurement and evaluation in education is necessary; b) certain 
measurement and evaluation tools are either invalid, unreliable, out 
of date, or unfair and should be withdrawn from use (sharply 
criticized were standardized achievement and intelligence tests as 
they affect bilingual/bicultural students; c) the training of those 
administering tests is inadequate, and schools of education, school 
systems, and testing industry must take this responsibility; d) there 
is overkill in the use of standardardized tests; and e) the National 
Teacher Examinations are an improper tool and must not be used for 
certification, selection, salary determination, tenure, dismissal; 
and similar matters. The document includes recommendations for 
immediate action and further study, "The Report of the Committee on 
Accountability to the NEA Representative Assembly July 1973" and a 
bibliography. (J A) 
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information for 
professional excellence 



The Charter of the National Education Association states the 
purpose of the nation's largest independent professional organiza- 
tion: "To elevate the character and advance the interests of the 
profession of teaching and to promote the cause of education in 
the United States." 

Through its program of Instruction and Professional Development, 
the NEA has a growing commitment to professional excellence, 
a commitment that can only be realized by well-informed members 
who ultimately will take the necessary, concerted action to achieve 
this goal. But information, knowledge, and understanding are 
essential to the success of sny action program to reach this goal. 
Accordingly, documents such as this have been prepared for a " 
better informed membership. 

At a time when information has become the currency of con- 
tem.porary society, our ability to gather, handle, and process this 
information will to a large degree determine the direction of our 
profession and the quality of its policy. 

A major activity of the NEA's program for Instruction and Profes- 
sional Development, therefore, has to do with the ''processing'' of 
information in a continuing effort to provide members with a syn- 
thesis of the best, the most reliable, and the most useful information 
related vo the goal of professional excellence. 

Your comments are invited on this document and on other IPD 
program activities. Also, your suggestions of other information 
topics for future consideration will be most welcome. For more in- 
formation about our program on professional excellence, write or 
call Instruction and Professional Development, National Education 
Association, 1201 16th Street, N.W., Washington, D. C. 20036, 
Phone:(202)833-4337. 
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INTRODUCTION 

. o how can you possibly award prizes when 
everybody missed the target?" said Alice. 
"Well^" said the Queen^ "some missed by more 
than others^ and we have a fine normal distri- 
bution of misses^ which means we can forget 
the target," (Lewis Carroll^ Alice^ s Adven- 
tures in Wonderland ) 

When that perceptive math teacher^ Charles Lutwidge Dodgson^ 
wrote the above allegory on testing and accountability^ he hit on 
a problem that is very much with us today. In fact^ today the 
problem has reached the proportions of a crisis in education; a 
crisis that could not have occurred in the orderly nineteenth 
century Dodgson knew — an elitist society where everyone Imew his 
place and had to keep it. This greatly-enlarged problem of the 
1970^ s may be the result of a growing distance between the goals 
of society and the traditional goals of the schoo'ls. Have we in 
education^ like the Queen^ lost sight of the target? John Cogley^ 
piece on p. 46 of the appendix explores this in more detail. 

What should be the posture of the teaching profession in re- 
lation to testingj measurement^ asse/:.:sment and the specter of ac- 
countability? A profession that is committed to excellence as a 
national goal cannot avoid making judgments about -(AThat is "good/' 
"better/^ and "best" for the public it serves; for the quality of 
excellence will always be at one end of a continuum. 

But the idea of excellence^ when applied to an* individual 
student^ must be considered as quite another matter since learning 
and personal fulfillment are both private and singular processes 
that occur individually. Clearly^ much of the present misuse and 
malpractice associated with standardized tests in schools is a 



great bai'i'ier to the kind of individualized attention to learning 
that is so v;idespread in the literature and so seldom found in 
the classroom. 

Despite what we know about the personal and individual nature 
of the learning process^ many schools are still very much given to 
lock-step learning processes — classes^ grades^ tests^ promotions^ 
textbooks — which seem antithetical to nearly all we know about 
learning. Under such conditions^ accountability for either stu- 
dent or teacher must be based on an abstract and flawed exercise 
in statistical futility. ("After you start school^ kids^ half of 
you will be below average f orevermore . " ) The fact is that ac- 
countability will not work in such an arbitrarily structured learn- 
ing environment with its misuses of standardized testing and its 
penchant for conformity. 

The current interest in criterion-referenced tests rather 
than standardized achievement (or norm-referenced) tests (see p. 49 
represents a more rational step toward answering questions about 
what students have learned. Ralph Tyler puts the matter in per- 
spective on page 42 of the September-October 1973 issue of Todays s 
Education : 

I think the term criterion-referenced has assumed im- 
portance ,today because^ in the past^ testing in the 
United States has been norm-referenced. Two kinds of 
events influenced the so-called "Modern Testing Movement" 
in VJestern society. 

One v;as the effort to identify people who wei'e sub- 
normal or superior in human functioning — the work of Binet 
which^ in this country^ resulted some 50 years ago in the 
development of the Stanford-Bine t test. 

The other was the development of the Army Alplia group 
intelligence test^ which v/as created by psychologists 
during World War I to select out of several million men 
those who v;ere most likely' to benefit from qiaick instruc- 
tion and v;ho would be able to go into the ma.ny kinds of 
jobs the military required. . . . 
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Thie ba::lc pin'pose of this testing is to take a total 
group and aiTange them in some kind of order so that you 
can say here is the top 10 percent and here is the "bottom 
10 percent. The population is arranged on a linear scale 
from the best to the v;orst. This is called norm-referenced 
testing. 

When this type of test is being made^ v^arious test items 
are tried out. If the items differentiate among the persons 
tested J they are retained. So a typical achievement test 
has about 80 percent of its items in the narrow range of dif- 
ficulty where between ^10 and 60 percent of . people tested get 
the answer right. 

If the purpose is to identify those who do best on the 
total test and those who do poorest^ this is an efficient 
way to go about it. But if you are trying to answer the 
question^ ^^What have students learned?" you run into diffi- 
culties. This is because (a) almost all items that most per- 
sons can answer correctly are dropped from the typical 
achievement test because they do not discriminate and (b) 
those items that almost no one can answer correctly are also 
dropped because they don't discriminate either. 

Actually^ instead of testing what our students have 
learned^ we have been using test items to differentiate some 
students from other students. 

Now^ in this era of accountability when we are being 
asked such questions as "Are pupils learning to read?" or 
"Can they compute?" we need a different approach. VJe must 
set up questions or exercises that are related to a particu- 
lar question — that is^ they are criterion-referenced: the 
criterion being whether or not pupils can read^ compute^ 
understand^ etc. 

In other words^ the new tests that are coming out are cri- 
terion-referenced because they are judged for their validity 
In terms of whether they really test what the schools are 
trying to teach and not whether they differentiate the bet- 
ter students from the poorer students. 

Increasing interest in educational accountability may well pro- 
duce some improvements in the way schools are managed and in the 
amount of control teachers have over the learning process and the 
ways in which it is measured. The very idea of a search for ac- 
countability v/ithin the labyrinth of the educational bureaucracy 
promises positive results. But these potential improvements can 
only be achieved through the concerted efforts of a strong and 
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xvell-~inf ormed teaching profession. 

This clocumont is an effort to provide Association leaders 
with recent NEA policy statements and a range of other information 
on current trends in the testing and measurement of student abili- 
ties^ disabilities^ and achievement. Although increasingly there 
are attempts being made to hold teachers accountable for the per- 
formance of theii^ students on a variety of test instruments^ such 
teachers are seldom directly involved in the selection of such 
tests or in the subsequent interpretation and use of the resulting 
scores and other data. Clearly^ something is out of joint in 
schools where the wrong tests are used for the wrong reasons with 
results that are damaging or at best^ grossly deceiving to 
students^ parents^ and teachers. It is^ in fact^ an intolerable 
situation. 

Since its Tenth National Conference on Civil and Humn Rights 
in February 1972 dealing with tests and the uses of tests as pos- 
£;5ible violations of human and, civil rights (see Bibliography^ 
p. 66 )^ the KEA has accelera,ted its activity in this area through 
^elective court actions^ through establishment of a national KHIA 
Task Force on Testing^ and more recently^ the MIIA Coininlttee on 
Educational Accountability. One immediate result of this 1972 
Conference was a moratorium on standardized testing issued by the 
NEA Representative Assembly later that year. 

At the 1972 Convention^ the follov/ing items of new business 
on the subject of testing were approved by the NPxA.^s Representa- 
tive Assembly: 

(Item 28) This Representative Assembly directs the 
National Education Association to imjnedlately call a 
national moratorium on standardized testing and at the 
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same tij.ne Get up a task force on standardized testing 
to research and make its findings available to the 1975 
Representative Assembly for further action, 

(Item 51) The NEA shall establish a task force to deal 
with the numerous and complex problems coirmuinicated to 
it und.er the general heading of tesi:ing. This task, 
force sh.all report its findings and proposals for fur- 
ther action at the 1973 Representative Assembly. (NEA 
Handbook 1973 . P • 37. ) 

Again this year^, an NEA Resolution stated tlie problem: 

73-36. Standardized Tests 

The National Education Association strongly en- 
courages the elimination of group standardized intel- 
ligence^ aptitude^ and achievement tests to assess stu- 
dent potential or achievement until completion of a 
critical appraisal ^ re viev/^ and re vision of current 
testing programs. 

The intei^iin report of the NEA Task Force on Testing was 
adopted by the NEA Representative Assembly in July 1973 and lias 
thus become Association policy. This report in its entirety is 
included in this document beginning on page . As Indicated 
above^ the final report of the Task Force will be presented to 
the 1975 Representative Assembly. Between now and 1975 the v/ork 
of this Task Force v/ill be of great importance since its final 
report may have long-range implications for the united teaching 
profession. Both the Chairman of the Testing Task Force, 
Charles J. Sanders^ and the NFA/IPD staff contact person for thi 
program^ Bernard McKenna^ will welcome your comments on the in- 
terim report reprinted here and your suggestions for future Task 
Force study. 

In its interim report the NEA Task Force on Testing is con- 
cerned not only v/ith the question of should there be evaluation 
buo also v/ith such qu.estions as: VJhat should be the nature of 
evaluation? Who should conduct it? VJhat shoulu be the 
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professional propa"ratlon of thorje v/ho conduct evaluation? Mov/ 
Gliould the results of evaluation be used? 

Testing^ of coui'se^ is only a part of the larger evaluation 
process in Aniei'ican society. From the idea of "professional ex- 
cellence" to batting averages and checkbook balances^ ours is a 
culture of assessment^ comparison^ and evaluation. But the mis- 
use of tests in the schools is another matter since children are 
individuals and as such do not lend themselves to group manipu- 
lation. Members of the Task Force report that^ while their ap- 
proach to evaluation is constructive and positive ^ they are 
urging that the destructive characteristics of tests and mcasui/e- 
ments must be resisted in every v/ay by the teaching profession. 

During more than 30 hours of Task Force hcai'ings^ it v;as 
often reported by expert witnesses that tests art) developed and 
used in ways that serve to keep certain individuals and groups 
"in their place" near or at the bottom of the soeio-ecojiouiic 
scale and to assure otl^er individuals and groups that they v/ill 
maintain present high status positions both social].y and econom- 
ically. 

While the Task Force has recornm.ended that some measurement 
and evaluation in education is necessary^ it also suppoi'ts the 
Association policy statements quoted above. Members of the Task 
Force report that during their deliberatiijns it became increasingly 
obvious that the problems of standardized testing cannot be iso- 
lo,ted from the larger and more comp3.icatc^d issues of oducationa.i 
accountability. And tills involves such I'elated developments 
I'acing teachers as : 

• . Performance-based education 

® ^ in 
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• Performance-based teacher education 

• National and state assessment 

• Evaluation of educational programs 

• Criteria for teacher certification and recertifi- 
cation 

e Criteria for teacher selection, retention^ promo- 
tion^ and dismissal . 

Accountability and Testing 
At the NEA Accountability Work Conference in Denver^ 
May 29-31^ 1973^ the Association' s Executive Secretary^ Terry 
Herndon^ suggested that the time v/ill come when the NEA "estab- 
lishes some kind of a testing center^ some kind of a center 
which^ like the American Medical Association in dealing with hos- 
pitals^ medical schools^ etc.^ will accredit standardized tests 
for use in public schools or to be used by members of our profes- 
sion." 

At the Denver meeting^ which was attended by representatives 
from 30 states^ teachers were urged to pursue two strategies in 
dealing with the accountability crisis: (1) to determine hov/ to 
stop the destructive practices^ including the misuse of stan- 
dardized tests^ that are growing out of the accountability phe- 
nomenon^ and (2) develop the policy base and the perspective of 
the practicing classroom teacher to learn how to sort out the 
good systems from the bad systems that have developed. 

Three major assumptions have emerged from the Denver Con- 
ference^ and they are not unrelated to standardized testing: 

1, Adequate programs to deal with accountability can 
be developed only with practitioner involvement^ 
particularly with classroom teachers; practitioners 
are the only source of some of the information 
xieeded for making intelligent decisions and practi- 
tioners are vital agents for effective implementation. 
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2. Our response to the accounta^bility issue should 
be in terms of professional responsibility rather 
than reaction against any current models proposed 
or in operation. In this response there should 

be a delineation of professional decision areas in 
contrast to decision areas for which others outside 
the profession are responsible. 

3. Professional practitioners are aware of the lack 
of definitive research and hard knowledge to guide 
day-by-day practice. It is assumed that in many 
cases there is an inver se rat io between what is 
measured and what is important in education. Ther e 
is a very real danger that the aims of education 
will be increasingly restricted to those which can 
be most easily measured^ rather than those which 
are most important . (Emphasis added. ) 

Because it is closely related to some of the problems as- 
sociated with the use of standardized tests^ the final report 
of the NEA Committee on Accountability is also included here 
as it was adopted by the Representative Assembly in July 1973 • 
(See p. 41. ) 

What Are Standardized Tests? 

Standardized tests are usually divided into three major 
types: achievement tests^ aptitude tests^ and tests (or "inven- 
tories") of personal interests and/or personality characteristics. 
It is estimated that 100 million standardized tests are given in 
this country each year to students from kindergarten through col- 
lege at a cost of $25 million. 

Ebel (6:466 ff.) points out some important differences be- 
tween standardized tests and classroom or teacher-made tests: In 
the first place^ standardized tests come to the user printed and 
ready for- use. A second and rather obvious difference is that 
standardized tests must be purchased. Ebel estimates that typi- 
cally the per pupil cost of giving a single standardized test 
v/ill range from 20 to 50 cents. A third important difference 



ERIC 



-12- 



between standardized tests and teacher-made tests^ according to 
Et>el^ is in the content covered. "A good teacher-made test in- 
cludes a representative sample of the tasks that the students 
were taught to handle in that particular class . A standardized 
test^ on the other hand^ must limit its tasks to those likely to 
be taught in most classes studying a specific subject. , . . The 
emphasis that standardized tests of achievement place on standard 
course content may be a valuable counterbalance to the forces 
that make for excessive diversity in textbooks and in teaching." 
(pp. 467-68.) 

It can thus be seen that even the best standardized tests 
have a built-in tendency to standardize both curriculum content 
and instructional techniques — a characteristic that can have 
both desirable and undesirable results in a pluralistic society^ 
even when the very best standardized tests are used. However — 
to use the mentality of the testing community — not all stan- 
dardized tests can be "the very best." Some will be "good/' 
some will be "average/' and half of them (like the children who 
must submit to these tests) will always be "below average" — 
whatever that may mean. 

Related to such problems^ the NEA has again this year gone 
on record to protect students from the dangers inherent in a wide- 
spread national testing program. An NEA Resolution addresses this 
problem: 

73-11. National Testing and Assessment 

The Association will resist any attempt to trans- 
form assessment results into a national testing 
program that would seek to measure all students or 
school systems by a single standard^ and thereby 
impose upon them a single program rather than 
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pro\/^iding opportunities for multiple programs 
and objectives. 

In his very comprehensive Glossary of Terms^ Ebel (p. 565) 

defines a standardized test as ^*one that has been constructed 

in accord with detailed specifications^ one for which the items 

have been selected after a tryout for appropriateness in diffi- 

L 

culty and discriminating power^ one which is accompanied by a 
manual giving definite directions for uniform administration and 
scoring^ and one which is provided with relevant and dependable 
norms for score interpretation. Standardized tests are ordi- 
narily constructed by test specialists^ with the advice of compe- 
tent teachers^ and are offered for sale by test publishers. Un - 
fortunately not all tests offered as standardized te sts have been 
prepared as carefully as the foregoing description suggests. 
(Emphasis added.) 

National Teacher Examinations 
Since the NEA information package on testing^ for vjhich 
this document was prepared^ is limited to the testing and measure- 
ment of students^ no effort has been made to explore the strengths^ 
the shortcomings^ and the widespread misuse of the National 
Teacher Examinations^ which are sponsored by the Educational Test- 
ing Service. It should be pointed out here^ however^ that the 
NEA Task Force on Testing has taken a position that the National 
Teacher Examinations are improper tools and must not be used for 
teacher certification^ recertif ication^ selection^ assignment^ re- 
tention^ salary determination^ promotion^ transfer^ tenure^ or 
dismissal. (see p. 24.) Although these tests (NTE) have been used 
to license^ select, assign^ transfer^ promote^ and dismiss 
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teachers^ a preponderance of the research indicates that no 
single objective test instrument has been sufficiently developed 
for such purposes. 

It v/ould sei3m apparent^ therefore^ that use of the NT'S for 
these purposes represents misuse of the instrument. Interest- 
ingly enough^ officials of the Educational Testing Service^ de- 
veloper and sales agent for the NTE^ have acknowledged that some 
of these purposes do constitute misuse of the tests. And at the 
February 1972 NEA Conference on Civil and Human Rights^ Thelma 
Spencer^ Director of the Teacher Education Examination Program 
for the Educational Testing Service^ said: "Test scores are 
guides only^ and the NTE score is merely another piece — by no 
means the most important piece — of information about a poroon. 
This test^ or any test^, is only as good as the people v/ho use 
ito" (5:16) 

NEA Continuing Resolution //6 (1969^ 1970, 1972, 1973) states 
in x)art: "The Association believes that examinations such as the 
National Teacher Examinations must not be used as a condition of 
employment or a method for evaluating educators in service for 
purposes such as salary^ tenure, retention, or promotion." 

The inequality and unfairness associated with improper use 
of standardized tests is one reason — and, of course, there are 
miany others — why the idea of educational a.ccountability as it 
is presently being promoted is an empty slogan for those who would 
truly improve the quality of learning and teaching. The materials 
In this document will provide Association leaders with resoitrces 
and background material to help thorn counsel members to bettor use 
the tools of testing and to resist their use when other m.eans are 
miore appropriate. 
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Adopted by the Task Force, May 29, 1973 
Prepared for the Task Force by Bernard McKenna 
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of the report on page 39. 
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NEA RESOLUTIONS AND NEW BUSINESS ITEMS ON TESTING 



72"44> Standardized 1 ests 

The National l.\ducation Association strongly encourages the elimination of group stand- 
ardized intelligence, aptitude, and achievement tests to assess student potential or achieve- 
ment until completion of a critical appraisal, review, and revision of current testing pro- 
grams. 

NEA New Business Items, 1972 

Testing 

This Kepresentative Assembly directs theNational [Education Association to immediately 
call a national moratorium on standardized testing and at the same time set up a task force 
on standardized testing to research and make its findings available to the 1975 Kepresentative 
Assembly for further action. (Item 28) 

The Nl'iA shall establish a task force to deal with the numerous and complex problems 
communicated to it under the general heading of testing, Tliis task force shall report its 
findings and proposals for further action at the 1^73 Kepresentative Assembly, (liem r>\) 

OTHER SUPPORTING RESOLUTIONS 

C:-6, n va luation and Subjecti ve Ka tiings 

'i'hc National l\ducation Association believes that it is a major responsibility of edu- 
cators to participate in the evaluation of the quality of tl:eir services. To enable educators 
to meet this responsibility more effectively, the Association calls for continued research an^l 
experimentation to develop means o>fobjective evaluation of the performance of all educators, 
including identification of (a) factors that detei'iiiine professional competence; (b) factoivs 
that determine the effectiveness (^f competent professionals; (c) methods of evaluating 
effective professional service; and (d) methods of recognizing effective professional service- 
through self-realization, personal status, and salary. 

The Association also helieves Uiat evaluations should be conducted for the purpose of 
improvement of {)Ci*f(jrniance and qualiiy of instruction offered to pupils, based upon written 
criteria and following pn)cedures mutually developed by and accejitable to the teacher asso- 
ciation, the administration and die governing board. 

The Association insists diat the evaluation program must recognize the rights of the 
educator who is evaluated. Thes»e include the right to: 

a. Information concerning the evaluation prcKedure of the school district or institution. 

b. Open evaluation without subterfuge and advance notice of evaluation visits with dis- 
cussion of the teacher's g(Xils and methods. 

c. j'.valuation at least in part by peers skilled in the teacher's professional or subject 
area. 

d. (*onsultation in timely fashion after a formal evaluation visit and receipt of and 
opportunity to acknowledge In writing any formal evaluation report prior to place- 
ment in a personnel file. 

e. ['.valuation reports which assess strengths, nine progress, indicate remaining de- 
ficiencies and suggest specific measures theteacher can take to overcome indicated 
deficiencies. 

f. Participation in a professional development program including such activities as 
appropriate counseling and supportive services, released time for in-service w(^rk, 
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and opportimity to observe or seek and give assistance to otlier tcncliors in class- 
room settings other than one's own, 

g. Review of any material considered dei'ogatory prior to placement in the individual's 
personnel file and submission of a written answer attached to the item in the file. 

h. Supervision which is constructive, provides an opportunity to correct deficiencies, 
takes into account the variety of learning and teaching environmental factors, and 
emphasizes career development of the professional educator. 

The Association believes that examinations such as the National 1'eacher I'.xamination 
must not be used as a condition of employment or a method for evaluating educators in serv- 
ice for purposes such as salary, tenure, retention, or promotion, (69, 70, 72) 

72-13. National Testing and Assessment 

The National Education Association notes that the first report of tiie National Assess- 
ment of Hducational Progress on writing, citizenship and science has been issued. 

The Association will continue to resist any attempt to transform assessment results into 
a national testing program that would seek to measure all students or scliool systems by a 
single standard, and thereby impose upon them a single program rariier tium providing 
opportunities for multiple programs and objectives, 

72-8. Student Rights 

The National Fr'.ducation Association believes that basic student riglus include: tiie right 
to free inquiry and expression; the right to due process; the riglu to freedom of association; 
the right to freedom of peaceful assembly and petition; the riglit to participate in tiie govern- 
ance of the school, college, and university; the riglu to freedom from discrimination; and 
the right to equal educational opportunity, 

C-iO. Improvement of Instruction 

The National F-ducation Association believes that a prime responsibility of professional 
associations is to stimulate significant improvements in the quality of instruction. Much of 
the responsibility to make educational changes should lie with the teachers tlirough their 
influence and involvement in democratic decision making in and out of the school. 

The • Association supports the principle of involving its National Affiliates, Associated 
Organizations, and {departments in efforts to improve instruction in our schools. 

The Association urges local affiliates to involve members and those affected in the de- 
velopment and implementation of programs for instructional improvement, curriculimi de- 
velopment, and individualization of instruction relevant to the needs of the students. 

The Association reconmiends that professional educators enter into active collal>oration 
with research and development specialists, both in regional educational laboratories and in 
industry, to promote technology's potential contribution to education by guiding the develop- 
ment of technology in die most educationally sound directions. It encourages school sys- 
tems to establish learning materials centers. 

The Association further recommends that the profession, in cooperation with other in- 
terested groups, establish standards for educational materials, and insist tluu publishers 
and producers use the services of a competent educational institution or facility to field test, 
in actual classroom situations, such materials, and publish the results of iheir efferti veness, 
(69. 70, 7 i) 
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Section I 



A GENERAL POINT OF VIEW 



Evaluation is a common practice in American society. From the worn but sturdy cliche' 
*'the unexamined life is not worth living" to the precise timing of the long-distance runner, 
ours seems to be a culture of assessment; comparison, evaluation. The large issue to which 
the NEA Task Force on Testing has turned its attention is not so much whether there should 
be evaluation but what should be its nature, who should conduct it, how should those who 
conduct evaluation be prepared, and how should the results of evaluation be used. 

The Task Force was impressed with the strong thread running throughout its hearings 
and from the literature of the potential profound effect on human beings' lives of the classify- 
ing and labeling characteristics and uses of tests. It was frequently reported that tests are 
developed andused in ways that serve tokeep certain individuals and groups "in their places" 
near or at the bottom of the social-economic scale and to assure other individuals and groups 
that they will maintain present high status positions both socially and economically. The 
Task Force concluded that while its approach to evaluation would be constructive and positive, 
such destructive characteristics of tests and measurements must be resisted in every way. 
The use of tests, as Arthur Coombs has prioritized the teaching of reading, must at times 
be superseded by the development of the students' self-concept. 

Because the main charge to the Task Force was to respond to NHA resolutions and new 
business items on testing and evaluation that appear at the beginning of this report, and par- 
ticularly to the issues revolving around standardized testing , the Task Force has developed 
its major efforts, its findings, and its recommendations to those ends. 

But the Task Force is aware that the probiems of standardized testing are part of a 
much broader context, are central to the much more complicated fabric of accountability. 
And woven into that fabric are such other issues as — 

1. Performance-based education 

2. Performance-based teacher education 

3. National and state assessment 

4. Evaluation of educational programs and conditions 

5. Criteria for teacher certification and recertification 

6. Criteria for teacher selection, retention, promotion, and dismissal 

7. Other issues in addition to testing that result in displacement and exclusion of stu- 
dents from learning opportunities. 

It is the point of view of the Task Force that the united teaching profession must ulti- 
mately deal with all of these. But not all can or should be dealt with through the same mecha- 
nisms or along identical time lines. For this reason the recommendations for further study 
are presented in two separate sections: 

One dealing with those issues the Task Force believes to be direct testing issues 
(Section IV); 

And a second dealing with other important assessment and decision-making issues, 
which may need to be dealt with in interlocking NEA programs and projects (Sec- 
tion V). 
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The Task Force calls attention here to the significance for its work, and for continuing 
work on testing issues, of the resolutions and items of new business of the 1972 Representa- 
tive Assembly that address themselves to these issues. The resolutions appear in the front 
of this report. The Task Force believes that, as stated in Resolution 72-44, the NFA should 
continue to encourage **the eliminationof the useof group standardized intelligence, aptitude, 
and achievement rests to assess student potential until completion of a critical appraisal, 
review, and revision of current testing programs." A number of state education associa- 
tions have already taken action, based on that recommendation, calling for a moratorium 
on testing in their states. 

At the same time, the Task Force is aware that, in some stares, statutes mandating 
testing programs and local school district policies on testing will need to be revised or re- 
moved. The Task Force proposes in Section III of this report areas for immediate action 
by NEA. 

Because of the complexity of the tasks that it undertook, the relatively short period of 
time that it functioned, and the commitment of the NFA to continue to study the testing issues 
for two more years (1972 Representative Asembly Itemof New Business 28), the Task Force 
emphasized the identification of specific areas for continued in-depth study. The main sub- 
stance of these areas appears in Section IV. 



Section II 

THE TASK FORCE BELIEVES . . . 

The positions taken below are based on over 30 hours of hearings, survey of die vast 
literature on .testing and evaluation in education, and debate by Task Force members of the 
issues. While time limitations did not permit exhaustive study or empirical research by 
the Task Force, the findings are based on expert judgment, experience, and research re- 
ported by witnesses representing such groups as teachers, students, minorities, government 
agencies, college and university personnel, school administrators, testing industry, and a 
wide variety of professional associations concerned with educational and psychological test- 
ing. Th/3 Task Force stands on these premises, recognizing, however, that a number of them 
require further investigation. The nature of such investigation is proposed in sections IV 
and V. 

I. The Task Force believes that some measurement and evaluation i n education is 
necessary. 

A state education association human relations director told the Task I'orce, "Don't 
deny testing as an essential area... but it must be based on experiences people 
have had." 

Holmen and IDocter conclude that *\ . . few would argue against allowing schools 
to give tests to determine what a student has learned in some course of study." ^ 

As a representative of a national testing association pointed out, "lOescri ptions 
and decisions are going to be niade with or without tests. It's inevitable.... If 
we are going to make descriptions and decisions, it makes sense, within limits of 
costs, to seek the best information." 



^Holmen, Milton G., and Doct-er, Richard. EducaM onal and Psychologic al TesHng . New York: Russell Sage 
Foundation, 1972. p. 13. 
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2. "Fhe Task Force believes that some of the measurement and evaluation tools developed 
over the years, and cu rrently in use, contain satisfactory validity and reliability require- 
ments and serve usefuFpurposes when properly administered and interpreted . 

Teachers reported that individual diagnostic instruments in such basic skill areas 
as reading and mathematics are helpful in identifying appropriate remedial action. 
And what is called Item Response Analysis in the Cleveland Public Schools appears 
to be a promising approach — clusters of item responses are used to develop edu- 
cational prescriptions in response to identified learning problems. Teachers are 
treated as the professionals they are in that they are encouraged to select and try 
alternate teaching resources; that is, they both develop and apply the prescription. 
A key question asked in the Cleveland plan in analyzing clusters of responses is, 
"Is this something that should bareasonably attained by the child?" 



3. The Task Forc e believes that ce rtain measurement and evaluation tools are either 
invali d and Un reliable, out-of-da te, or un f air and should be withdrawn from use , 

The unfairness of some tests to some students was brought to the attention of the 
Task Force from a variety of sources. A group of minority students told of being 
placed in special education classes on the basis of being below grade level on 
standardised achievement tests, placements that could be adjusted only after 3 
years. Instances were relatedof black students* being denied participation in extra- 
curricular activities on the basis of tests. Teachers reported that group tests 
applied to very small children are unreliable because of the children's varying 
attention spans and maturity levels. 

The Task Force was particularly impressed with substantial testimony to the 
effect that both standardized achievement and intelligence tests are unfair to bi- 
lingual/bicultural students as well as to non-Fnglish-speaking and non-standard- 
(•'.nglish-speaking students. We cite here the following following resolution sub- 
mitted by the Bay Area (Bilingual l:!ducation League of California and adopted by 
the NI-A First American and Hispanic Task Force which bears directly on this 
issue: 



Testing of children whose language is other than standard IZnglish with 
instruments that were developed for users of standard English violates 
the norm and standardization of these instruments and makes the results 
questionable. We contend that the use of these instruments with children 
whose language is other than standard I-!nglish is invalid. 

Sufficient evidence now exists to direct us to th? development of crite- 
rion-referenced assessment systems as a means of improving the account- 
ability of educational progran^s. These evaluation j. recesses must corre- 
spond to local performance objectives. 

7'he development of valid test instruments for bilin>^i'al and bicultural 
children must be directed by qualified bilingual and biculturai personnel in 
the educational field or in similar fields, to assure that the test instruments 
will reflect the values and skills of the ethnic and cultural groups being 



Whereas currently used standardized rests measure the ])otcntial and 
ability of neither bilingual nor bicultural children and yet are so used and 
relied upon to count, place and track these children, we resolve that such 
use of standardized test:; he immediately discontinued. 

It was also called to the attention of the Task Force that standardized tests dis- 
criminate unfairly on the basis of sex. 



RFSOI.UTION 



tested. 



4. The Task Force believes that the training of those who use measurement and evalua- 
tion tools is woefully inadequate and that schools of education, school systems, and the test- 
ing industry all must take responsibility for correcting; these inadequacies. Such training 
must develop understanding about the limitationsof tests for making predictions about poten- 
tial learning ability, of their lack of validity in measuring innate characterirtics, and their 
dehumanizing effects on many students. It must also include understanding^he students' 
rights related to testing and the use of test results . 

Teachers reported that they are frequently unfamiliar with the tests they are re- 
quired to administer, the purposer> of the overall evaluation programs they are a 
part of, and the uses that will be made of the results of testing programs^ They 
told the Task Force that neither preservice nor in-service programs for teachers 
provide adequate preparation for administration and interpretPtion of tests or pre- 
scribing learning activities based on the findings. 

Professors of education told the Task Force that the components on tests and 
measurement in teacher education programs are frequently vague or nearly absent, 
and that in many institutions there are no requirements for instructions in tests 
and measurement as a part of teacher education programs. A survey of require- 
ments in the 50 states for instruction in tests and measurements as a prerequisite 
for teacher licensure showed that only 13 states have such requirements and some 
of these apply only to specific groups of teacher trainees, e.g., special education 
and guidance and counseling. 



5. The Task Force believes there is overkill in the use of standardized tests and that 
the intended purposes of testing can be accompl ish ed through less use of standardized tests^^ 
through sampling^ techniques where tests are used, and through a variety of alternatives to 
tests. 

Molmen and Docter^ estimate that at least 200 million achievement test forms are 
used each year in the U.S. And this, they report, is only 65 percent of all educa- 
tional and psychological testing that is carried out. Even though it is difficult to 
know how much is too much in this arena, it appears to represent three or four 
standardized tests per student per year. And this is in addition to the millions of 
teacher-made tests, surveys, inventories, and oral quizzes to which students are 
subjected annually. 

Representatives of the testing industry and others told the Task Force that 
sampling of student populations could be as effective as the blanket application of 
tests that is now so common. Some suggested that such pi'ocedures, in addition to 
increasing the assurance of privacy rights, would conserve time, effort, and finan- 
cial expenditure. 



6. The ^ras k For ce believes that th e National /Peache r i Examinations are an improper 
i9^^j_il^.4_ PiM51_P?i„^?. used for teacher certification , rccer t ification, selecti on, a ssignment , 
retention, salary determination, promotion, transfe r , te nure, or dismissal . 



The Task Force heard testimony that the National Teacher Fxarnlnations have been 
used to ll'^ "^se, select, assign, transfer, promote, anddismiss teachers. Research 
indicatcb -li-it no single objective tool is highly enough developed for these purposes. 
It therefore seems apparent that application of the NTI for these purposes repre- 
sents misuse of the instrument. The l\ducatlonal Testing Service itself, developer 
and marketing agent for the iLxamlnatlons, has acknowledged that some of these 
purposes constitute misuse of the test. 



2!bid., p. 38. 
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7. The Task Force believes that tlie results from group standardi zed tests s]iould not 
be used as a basis for allocation of federal or state funds , 



The Task Force learned that in some states some funds are distributed to schools 
on the basis of student scores on standardized tests. And some guidelines for 
proposal development in applying for federal funds require that systemwide resting 
programs be agreed to as pai't of eligibility for participation. Since stimdardi/icd 
tests apply so unevenly to different groups and individuals and often poorly predict 
potential learning ability, and since so many of them are incapable of diagnosing 
the most significant learning difficulties, it would appear that their use for deter- 
mining which educational programs should be funded and for what students would 
result in inaccuracy and unfair treatment of some groups and individuals. 



^* The Task For c e believes that standar dized tests sh ould n ot be used for trnckiiig 
students . 

The issue of tracking in and of itself has been a practice of questionable vnhie for 
many years. A concentration of studies in the 20's and 30's found little evidence 
that homogeneous grouping improved student learning. In the 50's, when American 
schools were being pointed at as contributing to the United States's second position 
in the space race, tracking was again widely instituted, followed by another concon- 
trcition of studies on its effects. The findings the second time indicated that in 
general children who were grouped learned no more than those who were treaiod 
j lieterogeneously. To date no substantial evidence of increased learning as a result 
of tracking has been produced, yet tracking goes on. Some kinds of special educa- 
tion may be defensible for some students for part of the time on the basis of making 
teachers' jobs more manageable. But if this or other reasons apply, they sluxild be 
put forth, rather than that learning is improved, i-ven then, assigimient to special 
programs sliould be based on individual student needs determined by individually 
I administered diagnostic instruments, by mutual agreetnent with pai'ents, and on a 
j part-time and temporary basis. There sliouki heopportunilies for students to move 
back and forth from reguhir to special programs as their social and emotional needs 
as well as academic requirements indicate. 

The I'ask Force believes tiiat while the purposes P^.l^^^'-L^'^'^il^Jil^i: 
Assessment of j'iducation iiU\y„ h^'^^'^^l beji^P ,.!JLH1^ OIll^ll^^'^J^^ ^?.9J?„5l'^?P^^^^ 

such programs have subverted_tlie original iJltej^it and as a result are poUMitially harmful, 

! A main |nir|X)se of the Nati(^nal Assessment of I'ducatinnal Progress has been to 
' determine, for representative samples of the Atneriean public, levels of under- 
I standings and abilities to pcT^form in a variety of areas considered by its developfM's 
I important for a large majorityof the society. The Task Force helievesp as rej)oi*(e<I 
; by an earlier Nl'.A Task Frjrce,^ "that all Americans need to bo educalCi.!, and that 
ii is essential to identify the educational nec^dsof our people and to I'espnnd to ilif jse 
needs with relevant and effective educational programs, both through formal sclu)ol- 
. ing and through other means," ('I hc NM A Task I'orce on (>»nipul8ory l\diicarion, In 
its report, recommends a number of pnnnising alternatives to present school 
organization and process for accomplishing the ends.) Fhe Task Force on Testing 
is supportive of efforts to identify the educational necxls of the American people. 

But adaptations of national assessment programs in sof7ie states a re l)einu 
manifest as statewide testing programs, applied to all students, and used to compare 

^NEA Task Force on Compulsory Education. Report of the Task Force . Washington, D.C.: NEA, 1973. 
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population groups, school systems, individual schools, even teachers and students. 
Both such applications and the dissemination of the x'esults from them have 
deleterious effects on students and teachers and evoke inaccurate and negative 
responses in public understanding of and attitude toward the schools. Members have 
expressed concern about National Assessment through Resolution 72-13.'^ 

10. The Task Force believes t hat both the content and use of the typical group intelligence 
test are biased against those who are economically disadvantaged and culturally^ and lin-- 
guistic a lly different , and e specially against all minority groups . 

Hoffman reports, "There is no generally satisfactory method of evaluating human 
abilities and capabilities, though occasionally it can be done individually with re- 
markable precision, 5 

Considerable research over the years has led to the conclusion that the most 
conimonly used group intelligence tests measure only one aspect of intelligence — 
verbal capacity. And even if it were agreed that this aspect is an important pre- 
dictor of capacity to be successful in the society, conventional intelligence tests 
still are grossly flawed. For these reasons some have called for complete elimina- 
tion of group tests of mental ability, including abolishment of the term ''IQ." 

Scores on tests of mental ability are so influenced by past experience and cul- 
tural background that they are highly biased in favor of those groups whose experi- 
ence and culture the items reflect. The content frequently highly reflects middle- 
class culture and experience. The tests are often characterized by an ambiguity 
that confuses those who think critically and in depth. Hoffman^ reported this more 
than a decade ago. In addition, the work of Getzels and Jackson, later followed up 
by Torrance, has shown that intelligence tests reflect mainly the ability to converge 
on single, predeterniined correct answers. An important prerequisite to creativity, 
the ability to carry on divergent thinking , is not often measured in the typical in- 
telligence test. As Barzun has put it, mechanical tests raise niediocrity above 
talent.7 

I'ldward C^asavantes, a prominent Chicano psychologist,^ told the Task Force 
that poverty alone is the major factor in causing minority groups to appear to be of 
less ability than others. 

This effect of poverty on IQ is further substantiated by Jane Mercer in a report 
on her landmark research in which she states that "persons from the lowest socio- 
economic groups were far more likely to be (considered mentally retarded) than 
were those from higher status levels."9 



'^"National TesHng and Assessmenf — 72-13." The National Education Association notes that the first report 
of the National Assessment of Educational Progress on writing, citizenship, and science has been issued. 

The Association will continue to resist any attempt to transform assessment results into a national testing 
program that would seek to measure all students or school systems by a single standard, and thereby Impose 
upon them a single program rather than providing opportunities for multiple programs and objectives. Wash- 
inyton, D.C.: NEA, 1972, 

^Hoffman, Banesh. "Psychometric Scientism." Phi Delta Koppan , April 1967. 
^Hoffman^ Banesh. The Tyranny of Testing . New York: Crowel I -Col I ier, 1962. 

^Barzun, Jacques. Teacher in America. Boston: Little, Brown, & Co., 1945. (Doubledoy, Anchor Books, 
1954.) 

^Casavantes, Edwaid, Executive Officer, Association of Psychologists por La Raza, in testimony before the 

Task Force, March 31, 1973. 
^Mercer, Jane R. Labeling the M enta lly Retarded . Berkeley: University of Call fornia Press, 1972. 
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U, T he Task Force believes that the use of the typical intelligence test contributes to 
what has come to be termed "the self-fulfilling prophecy, ' whereby students' achievement 
tends to fulfill the expectations held by others . 

The Task Force was impressed by considerable testimony in support of the findings 
of the Rosenthal and Jacobson study. ^0 Where heavy emphasis is placed on intelli- 
gence testing, students may tend to be pigeonholed on the basis of tests. Less is 
expected of those who do less well on the tests* There is little question that teach- 
ers' expectations contribute to student performance. Thus, it can be concluded that 
those who are expected to achieve less actually achieve less, and vice versa. 



12. The Task Force believes that test results are too often used by educators, students , 
and parents in ways tha t are h u rtful to the self-concept of many students . 

Holmen and Docter^^ report that of all the criticisms of tests this one is the most 
difficult to dismiss. Few would deny the importance of a positive self-image to en- 
hance the possibilities for student learning. 



13. The T ask Forc e beli e ves that die testing industry must demonstrate significantly 
increased respon sibilit y for validity, reliaFility, and up-to-dateness of their tests, for their 
fair application, and fo r accur a te and just interpretation and use of tlieir results . 

The Task Force objects to the strong tendency of representatives of the testing 
industry to place most of the blame for the problems of testing on test usage and to 
assume little responsibility for the uses made of their products. 

But a prior issue is the responsibility of the industry to ensu:^e relevant con- 
tent, validity, and reliability in its product. The Task Force was told that some 
1 tests remain on the market for many years beyond a time when much of tlieir con- 
tent has become irrelevant simply because there continues to be a market for them. 

Matters of validity and reliability, fair application, and accurate and just inter- 
pretation and use are dealt with at other places in this report. It need only be re- 
iterated here that these are joint responsibilities in which the testing industry needs 
to participate much more than it has in the past. 



14. The l\r-5k Force believes that the public, and some in the professio n, misinterpret 
the _ results' of TesTs jhey^ relate to status and needg pf gr oups' of students as we ll a s to 
Ind i duaI"T t udents . 

The statistical fact that 50 percent of any population will always end up below the 
mathematical average (''norm") leads many to believe that being below average 
means poor quality performance. This is not necessarily so. The matliematical 
average may or may not be highly related to competent performance. The public, 
particularly, needs to come to understand that norming processes automatically 
place half the students below the average, no matter how weil they perform. The 
Task Force heard testimony that the use of Grade Equivalent scores leads to draw- 
ing inappropriate conclusions on the part of educators, parents, and smdents. _^ 

15. In summary, the Task Force believes t hat th e ma jor use o f tests should be for the 
improvement of instruction — for diagnosis of learning difficu lties and for pres cribing 

^ORosenthal, R., and Jacobson, L. Pj^^^maii on in the Classroo m: Teacher Expectations and Student's i TiteU 

lectual Ability. New York: Holt, 1970. 
^ ^Holmen and Docter. Ibid., p. 38, 
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learning activities in response to learning needs. They must not be used in any way that 
will lead to labeling and classifying of students, for tracking into homogeneous groups as 
the major determinants to educational programs, to perpetuate an elitism, or to mai ntain 
some groups and individuals **in their place'' near the bottom o f the socioeconomicT adder. 
In short, tests must not be used in ways that will deny any student full access to equal edu^ 
cational opportunity . 



Section III 

RECOMMENDATIONS FOR IMMEDIATE ACTION 
(J973-74 Year) 

1. In the fall of 1973, tlie NEiA should provide to all state affiliates, for communica- 
tion to all state-affiliated locals, and to agencies and associations concerned with 
educational testing issues, specific guidelines appropriate for adoption as local 
school district policy calling for — 

a* Immediate replacement of blanket use of (i.e., application to all students) group 
standardized achievement tests by sampling where necessary of the various 
school populations 

b. Provision to local school districts by test suppliers of procedures for using 
different item samples on different student populations and individuals. 



The Task Force believes that immediate implementation of such procedures 
will serve the purpose of improving the conditions surrounding rights of privacy j 
of students, and prevent publication of scores conducive to stigmatizing 
minority and nonminority students. Such procedures would also reduce the 
inordinate amount of time spent in test administration and scoring* 

2, In the fall of 1973, NE/\ should begin consultation with the National Council on 
Accreditation of Teacher Education and the American Association of Colleges of 
Teacher Education to influence revision of the current accreditation standards and 
school of education curricula to include specific requirements for instruction in 
tests and measurement for all preservice teacher education programs. In such 
consultations, topics should include items listed under No. 4 of "The Task Force 
Believes," p. 24. 

3, The NEA should begin consultation with such organizations as NCMl"., AI'KA, and 
A PA to consider appropriate revisions to the Standards for Development and Use of 
Educational and Psychological Tests developed cooperatively to assure the proper 
development and use of standardized tests. 

4, By February 1 of 1974, thcNIlA shouldprovide to all state affiliates and to agencies 
and associations concerned with testing issues, for communication to all state- 
affiliated locals, specific guidelines appropriate for adoption as local school dis- 
trict policy calling for — 

The local development of criterion-referenced tests in allbrnnclics of the cur- 
riculum as alternatives to current standardized testing programs. 

While the Task Force has been cautioned that the local development of valid and 
reliable criterion-referenced tests is a complex and time-consuming job, wv 
believe it must be done, and such efforts must get under way imincdiately. 

-28- 



ERIC 



5, By June 1 of 1974, tlie NEA should provide to all state affiliates, for communication 
to all state-affiliated locals, and to agencies and associations concerned with edu- 
cational testing issues, specific guidelines for minimal content for in-service 
education programs for teachers and other school staff, including paraprofessionals, 
on tests and measurement. Such content should include items listed under No, 4 of 
'The Task Force Believes," p, 24, 

6. By June 1974, the NliA should provide to all state affiliates, for communication to 
all state-affiliated locals, and to other agencies and associations concemed with 
educational testing issues, news-release type materials for use in educating both 
educators and the public on the appropriate uses and limitations of test results and 
familiarization with a range of alternatives to current com.mon testing practices. 

Section IV 

RECOMMENDATIONS FOR FURTHER STUDY BY THE 

NATIONAL EDUCATION ASSOCIATION ON TESTING ISSUES 

The recommendations that follow are intended to be pursued during the 1973-74 year 
concurrently with the implementation of those in the preceding section. In addition, the 
recommendations in this section should be pursued in depth throughout 1974-75, final recom- 
mendations for policy and action to be made by the Task Force on Testing to the 1975 NI;!A 
Representative Assembly, 

Goals fo r Ac co mplishment by 1975 

The Task Force recommends intensive study leading to specific action recommendations 
on the following by June 1975: 

1. I'.ssential roie^^ and resp<:)nsibilities of various concerned groups^^ in assuring 
sound and fair devclo|)ment of evaluation yystertis 

The term evaluation syst ems is used here instead of tests because it is urged 
and expected tliat a wide variety of alternatives to tests should and can be de- 
veloped for evaluation purposes. The Task Force was cautioned that alter- 
natives, perhaps even more than conventional tests, must be subjected to rigor- 
ous research and test and tryout leading to validation, 

2. r-ssential roles and responsibilities of such groups^ 2 assuring appropriate dis- 
tribution and administration of evaluation systems 

3. F.ssential roles and responsibilities of such gi'oups^^ \x\ assuring accurate and fair 
ii'iterpretation of the results of evaluation systems 

4. Essential roles and responsibilities of such groups^^ assuring relevant and con- 
structive action programs based on the results of the use of evaluation systems. 

The above should be considered general goals. Action programs will need to be developed 
for accomplishing each of the goals. Some programs may be developed that will respond to 
more than one of the goals. 



See page 22. Part I, for listing of groups, 
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Recommended Areas for In- Depth Study Required To Accomplish the Goals 



The categories listed here were identified early in the deliberations of the Task Force 
and have been refined as the issues were studied and discussed. The Task Force began to 
have some strong impressions about some of them on which recommendations and actions 
might be taken. The Task Force speaks out on these in Section IL But as was indicated in 
Section I, because of the complexity of the subject, the limitation of time, and because, by 
resolution, the NEA is committed to study the testing issue for two more years, the Task 
Force rather emphasized the identification of areas for in-depth study. 

It is recommended that each of the categories below be studied in depth during 1973-74 
and that the final recommendations to the 1975 Representative Assembly reflect actions di- 
rected to the specific items in each category. The categories are The Student, The Teacher, 
The Testing Industry, The Government, and Other Agencies and Associations. 

I. The Student 

A. l-^ffect of tests on labeling and classifying students in ways that restrict the develop- 
ment of their potential. 

B. Bias in lest content that leads to unfair results with some groups on the basis of 
race, sex, socioeconomic status, bilingual/bicultural, non-English- and non- 
standard- English-speaking. 

C. Effect of tests on student self-concept. 

D. Effect of tests on the "self-fulfilling prophecy" concept. (See p. 27.) 

E. Degree to which the content and use of tests invades privacy of students. 

F. Degree to which publication of test scores invades the privacy of students. 

G. Degree to which tests affect the more mobile members of the student population, 
n. n^egree to which tests contribute to the development in students of limited cognitive 

styles, e.g., convergent as opposed to divergent thinking. (See p. 26. ) 
I. Promise of alternatives for evaluating human capabilities such as the Ertl Index, the 
Belmont Battery, Test of Logical Thinking. 

II. The Teacher 

A. Effects of tests applied to teachers, i.e., professional status, morale, feelings of 
security. 

1. National Teacher Examinations and other tests applied directly to teachers. 

2. Use of student test results to judge teachers for retention, tenure, promotion. 
(See p. 24.) 

B. Effect of tests on curriculum development by educators. 

C. Effects of tests on experimentation with and implementation of new ways of teaching, 

D. Iiffects of teaching to the tests. 

I'!. Effects of tests on teachers' ability' to individualize instruction. 
I', ivffects of mandated testing programs on teacher academic freedom. 
G. l-iffects of use of tests to hold teachers responsible for educational outcomes of 
students. 

III. The Test ing Industry 

A. Vhc responsibility of the industry tor distribution of valid, reiiahle, up-to-date 
products. 

B. Yhc responsibility of the testing industry to withhold tests and service*;, where there 
is reasonable certainty they will be misused. 

C. The responsibility of the testing industry to provide validation data for specific 
regions and specific populations. 
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D. The responsibility of the testing Industry to consult with professional organizations 
in tlie development of standards of training for test usage and to share in the re- 
sponsibility for enforcement of the standards, 

E. The responsibility of the testing industry to relate testing to curricula and to assure 
that appropriate methods of evaluation be considered an integral part of curriculum 
development. 

F. Responsibility of ihe testing industry to conduct in-depth research, test and tryout 
of its products, and to continuously research their effects throughout the time of 
their use and to share information with the profession and the public on the extent 
of this research effort. 



IV. The Government 

A. TTie responsibility of government at all levels (national, state, and local, including 
local school boards) to assure that biased evaluation systems, and particularly the 
results of standardized tests, are not used for the allocation of funds. 

B. The responsibility of government at all levels to assure that the results of national 
and state assessment programs are not used for labeling and classifying students 
or for judging teachers. 

C. The responsibility of government at all levels to assure that national and state 
assessment programs do not lead to national and state curricula- 
ID. The responsibility of government at all levels to assure that the results of tests are 

not publicized in ways that violate the privacy of individuals or stigmatize specific 

populations, school building units, or school attendance areas. 
H. The responsibility of government at all levels to assure that the results of tests are 

not used in any way to promote segregation among or within schools, or to negatively 

affect teacher assignment. 
F. The responsibility of government at national and state levels to provide standards 

of licensure for test developers and producers. 



V. Other Organizations and Ass o ciations 

A. The responsibility of national professional associations and other organizations 
associated with, testing to fully involve, in a formal relationship, the organized 
teaching profession in all activities leading to the development of all policy, guide- 
lines, and procedures related to test development and usage. 

B. The activities of the College Entrance Examination Board in influencing college 
admission policy through the use of tests, including the effects of the work of its 
Commission on Tests, 

C. Colleges' and universities' responsibilities in developing and implementing alter- 
natives to present testing arrangements for admission to higher education. 

L. In this regard, the strengths of present open admissions programs should be 
studied and recommendations made on the basis of findings on their success 
and promise, 

2. Examination of the effects of the College Means Admission Program on students 
and the institutions. 

D. The responsibility of the Education Commission of the States to assure that appro- 
priate guidelines and cautions accompany the dissemination of both the instruments 
for and results from the National Assessment of Educational progress. 

E. Further cooperation with such testing reform efforts as The National Project on 
Testing in Education and The National Project on Educational Testing. 
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Some Recommended Actions for Accomplishing the Goals 



The recommendations that follow represent some, not all, specific actions to be taken 
that will contribute to accomplishing one or more of the four goals stated previously (p,29). 
These actions will need to become part of broader programs. It is expected that the con- 
tinuing Work of the Task Force on Testing will give high priority to spelling out such pro- 
grams. (The numbers in brackets following the items indicate the goal or goals which the 
particular action will contribute to accomplishing): 

1. Develop model standards of training and experience for state certification require- 
ments for all those who administer and/or use test results in the school. (2) 

2. Develop action plans to assure better control of test development and distribution 
by the testing industry through — 

a. Influencing appropriate federal and state agencies to better protect test con- 
sumers, 

b. Specifically, reducing legal barriers, including restrictions on KTC's refusing 
test sales to unqualified users. (2) 

, c. Support legal action where appropriate to challenge misuse of tests and viola- 
tion of rights of educators and students. 

3. Develop a program for broad publicizing of guidelines for collection, maintenance, 
and dissemination of pupil records, including those recommended in — 

a. NEA Code of Student Rights and Responsibilities. 

b. Guidelines for the Collection, Maintenance and Dissemination of Pupil Records, 
a report of the Russell Sage Foundation. (3) 

4. Extend the guidelines cited in #3 above by developing model policy statements on 
the publication of and general dissemination of test scores. (3) 

5. Recommendation of specific alternatives to standardi^-ed tests appropriate to the 
evaluation of students and educators. 



Section V 

RECOMMENDATIONS FOR FURTHER STUDY BY THE 

NATIONAL EDUCATION ASSOCIATION ON OTHER 

ASSESSMENT-RELATED ISSUES 

As was pointed out in an earlier section, testing is a part of a much broader fabric that 
lias come to be called accountability. Accountability means different things depending on 
who is defining it. But to many in the public and some in the profession it has to do directly 
with producing specific outcomes with students, particularly in such basic skill ai'eas as 
reading and mathematics. This aspect of accountability is obviously directly related to test- 
ing in that student performance is most often measured by the use of tests, particularly 
standardized tests. Other test-related issues that also are important to the accountability' 
movennent include- — 

1. National and state assessment programs 

2. Performance-based education 

3. Performance-based teacher education 

-32 » 



ERIC 



4. Management by objectives 

5. Program, planning, budgeting, evaluation systems 

6. Evaluation of educational programs and conditions 

7. Criteria for teacher certification and recertification 

8. Criteria for teacher selection, retention, salary determination, 
promotion, and dismissal. 

Each of these is in some way related to the other and to evaluation, and to tests and 
measurement. But the Task Force believes that several of these may not fall directly within 
the purview of the Task Force on Testing. 

We recommend that, as the testing issues continue to be studied and acted upon (as 
recommended in the preceding section), #1 above, the issues surrounding national and state 
assessment continue to be considered in addition to other testing issues. 

The others in the above list should be dealt with as follows: 

1. Numbers 2, 3, and? are of extreme importance to the teaching profession and should 
become the concern of a national task force appointed by the NEA president, with 
an appropriate secretariat and with its work coordinated with the NF.A program 
budget. 

The Testing Task Force learned of three national efforts on performance-based 
education and teacher education. None of these, at present, has had substantial 
input from the organized teaching profession. One of them, spearheaded by 
the Educational Testing Service, threatens to become a major effort to centralize 
coordination of the entire performance-based movement. 

The Task Force strongly urges that appropriate administrative assignments be made 
as soon as possible so that staff can begin working toward resolving those test- related 
issues which do not fall under the direct charge of the Task Force on Testing. In addition, 
all test-related issues should be vigorously pursued as directed by the appropriate resolu- 
tions and items of new business dealing with testing. 



RESOURCE MATERIALS 

Report of the Commission on Tests, I. Righting the Balance, College Entrance Examination 

Board, New York, 1970. 
Report of the Commission on Tests, II. Briefs, College Entrance Examination Board, New 

York, 1970. 

An Investigation of Sources of Bias in the Prediction of Job Performance, A Six-Year Study 

— Proceedings of Invitational Conference, The Barclay Motel, New York, June 22, 1972, 

Educational Testing Service, Princeton, New Jersey. 
Ethnic Isolation of Mexican Americans in the Public Schools of the Southwest, Report I, U.S. 

Commission on Civil Rights, Mexican American Education Study, April 1971. 
The Unfinished Education: Outcomes for Minorities in the Five Southwestern States, Report 

II, Mexican American Educational Series, A Report of the U.S. Commission on Civil 

Rights, October 1971. 

The Excluded Student, Educational Practices Affecting Mexican Americans in the Southwest, 

Report III, U.S. Commission on Civil Rights, May 1972. 
Mexican American Education in Texas: A Function of Wealth, Report IV, Mexican American 

Education Study, U.S. Commission on Civil Rights, August 1972. 
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Methodological Appendix of Research Methods Employed in the Mexican American Education 
Study, U.S. Commission on Civil Rights, January 1972. 

Fwaluation in the Inner City, Report of an Invitational Conference on Measurement in Educa- 
tion, April 24-25, 1969, Philadelphia, Pennsylvania. 

Metropolitan Achievement Tests Special Report, 1970 Edition, Selected Items Revised or 
Eliminated from the Metropolitan as a Result of Possible Bias Against Minority Groups, 
Ixeport No. 23 issued by Test Department, Ffarcourt Brace Jovanovich, Inc., September 
1972. 

The Use of Standardized Instruments with Urban and Minority-Group Pupils, Thomas J. 

Eitzgibbon, Test Department, ilarcourt Brace Jovanovich, Inc. 
Standardized Test Report, 197v3 Delegate Assembly, Iowa State Education Association — 

Informational Services Division, December 1972, 
Standards for Development and Use of Educational and Psychological Tests, Third Draft 

(formerly called Standards for Educational and Psychological Tests and Manuals). 
The Responsible Use of Tests: A Position Paper of AMEG, APGA, and NCME, reproduced 

by permission from 'Measurement and i waluation in Guidance, Vol. 5, No. 2, July 1972. 
I^roject Opportunity, A I^monsiration Guidance Project for Minority Poverty Youth, March 

1973, College Entrance Examination Board. 
Pluralistic Diagnosis in tlie Evaluation of Black and Chicano Children: A Procedure for 

Taking Sociocultural Variables into Account in Clinical Assessment, presented at tlie 

Meetings of the American Psychological Association, Washington, D.C., September 3-7, 

1971, by Jane R. Mercer, Associate Professor, Sociology, University of California. 
On the Explanation of Racial-Ethnic Group Differences in Achievement Test Scores, George 

W. Mayeske, U.S. Office of Education, Washington, D.C. 
School I^sychology and the Mexican American, Dr. Steve G. Moreno, AMAE Newsletter, 

May 1972. 

i'osition of American Association of School Administrators, George B. Redfern. 
I'ducational Measurement of What Characteristic of Whom (or What) by Whom and Why, Jack 

C. Merwin, University of Minnesota — Journal of Educational Measurement, Volume 10, 

No. 1, Spring 1973. 

A Moratorium on Public School Testing, The Position of a Professor of Measurement and 

Evaluation, Andrew L. King, D.C. Teachers College. 
Studies Relative to "Improvement" in theCognitivc i^crformanceof Minority Children Under 

Special Conditions, Edward J. Casavantcs. 
A 1 itany of Laments: Elementary and Secondary School Eactors Affecting Minority Group 

Resources for the Health and Mental Health Professions, lldward J. Casavantes (paper 

presented at a symposium at the American Association for the Advancement of Science 

Convention on Minority Group Resources fc^r the Health Professions, December 27, 1972). 
Slimmer ''Experience in Publishing" Training Ecllowships, Harcourt Brace Jovanovich Test 

Department Memorandum, March 15, 1973. 
icsiimony - IX)nald Ross Green, Director of Research, CTB/McGraw-lliii, Monterey, c:ali- 

fornia. 

U'stiniony - James H. Ricks, Jr., 'I'he Psychological Corporation. 

Tostitnony - Raphael Minsky, Supervisor of Psychological Services, Montgomery County 

Public Schools, Rockville, Maryland, 
'i'estimony — Winton II. Manning, l^ducational Testing Service. 

l estimony- l-ois J. Wilson, New York State United ^'eachers Assistant i:xecutive Secretary 
for Studies and Professional Services and Chairperson, NEA Human Relations Council 
toTTl\ 

Testimony - Yvonne Burkholz, Dade County Classroom Teachers' Association, Inc. 
Brief Report on the Teaching Situation in Alabama, Anthony Butler, i\ast i>resident, Alabama 
State Teachers Association. 
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Points on Testing Requiring Consideration by TTF, Miss Carol Wick, Varying Exceptionali- 
ties Resource Teacher, Nashville, Tennessee. 

Summary of Statement by Dr. Robbins Barstow, Director of Professional Development, 
Connecticut Education Association. 

Teachers Association of Anne Arundel County, Inc. Action Report, Volume 5, No. 3, Decem- 
ber 1972. 

Statement on Standardized Tests by Richard C, Gordon, President, Virginia Education Asso- 
ciation. 

Misc. News Stories on Releaseof TestScores: "Poor Marks for the City," New York Times, 
3/25/73; "Reading Scores Decline in theCity Schools Again"; "Schools in Fairfax Boast 
of Scores," Washington Post; "Below-Grade Publicity About Reading Scores," NEW 
York Times, 3/25/73. 

Informal Report on Crownpoint . . . Eastern Navajo: Eleven Schools' Reactions to the Testing 
Program. 

Reading and Testing: "One Cause of Reading Failure Is Reading Failure," New York 
Teacher (Magazine Section), an original article by Deborah Meier. 

Teacher Education and Professional Standards, Commission of Washington Education Asso- 
ciation, Vol. 1, No. 1, February 1973. 

The Scheelhaase Case: Scheelhaase vs. Woodbury Central Community School District, et al. 

Molville Record article published March 4, 1971, "Individual Basic Skills Scores Reported." 

Educational and Psychological Testing — A Study of the Industry and Its Practices, Milton G. 
Holmen and Rochard F. Docter, Russell Sage Foundation, New York, 1972. 

Journal of the Association of Mexican- American Educators, Inc., Vol. 1, No. 1, May 1973. 

Violations of Human and Civil Rights: Tests and Use of Tests — Report of the Tenth National 
Conference on Civil and Human Rights in Education, Washington, D.C. 1972. 
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In Janaary ot this year, the Executive Committee established, upon the 
recommendation of the Council on Instruction and Professional Development, 
an Accountability Committee, the structure and program for which was approved 
by the Board of Directors in February. 

NEA activity in accountability was supported by the National Council of 
State Education Associations, the Council on Instruction and Professional 
Development, the North Central Regional Advisory Council, NEA ' s Denver 
Accountability Conference, the NEA Executive Committee and the NEA Board ot 
Directors. In each case, the support was unanimous. 

Since its inception, the Accountability Committee chaired by myself, and 
the Council on Instruction and Professional Development chaired by Mel Leasure, 
have worked cooperatively in collecting data, providing intormation and developing 
the position offered in this report. The adoption of this report by this 
Assembly will provide the necessary direction by the membership to NEA 
governance and statf to ensure positive, aggressive action in the area of 
accountability at the national, state and local levels. 

It is the teeling of all of us involved in this project that the need 
tor this action is critical. At a time when teacher negotiators are being told 
they are proposing champagne programs for a beer budget at a time when 
federal money tor education is at a level less than subsistence ~- approx luia to iy 
$300,000,000 per year is being spent on testing. 

The tests developed at such a terrible price tag are not only totally 
inadequate tor the advertised purpose of providing legis lat ive guidance tor the. 
allocation oX tunds, but because they measure only a tiny portion ot the 
educational effort, the misuse of the test results is doing great violence 
to the creative educational process. 

There is a cult ot empiricism existing within the community of educational 
researchers. Since measurement in the affective domain does not provide 
e:iipirical evidence, measurement in that domain is generally excluded. At the 
samtf time, we as teachers are being told to humanize instruction. If curreiii 
trecKis continue, it is most probable that either tfie htmianizing of i ns t vuc i ion 
wilj diminish in importance, or the teachers direcled to vork in that area will 
not be evaluated for their worlc in that area. 

There is ample evidence ttiat test results are bein;^ useid to manip\ilate 
minority teachers and administrators within school districts. Civil ri.^hts 
are similarly being disregarded in the case of non-minority teachers. 

It may be true that if teachers were smart smart in tlie industrial 
or i<.^ntat ion through which accountability programs emanated they would ac cf^pt 
this trend. Instructing children on a single, empirical item program would he 
infinitely easier than creative teaching. The tragedy is that the teacher 
would not have an option in many cases. He would have "to teach to the tost" 
first in order to survive as an educator, then participate in creative teaching 
with whatever time is left. If this is not true in your situation, rest 
assured that is IS true in the case of a fellow teacher in another state. 
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This report dwells heavily on testing because measurement is the heart of 
accountability and '^measurement** to all too many of us in education means 
**tests**. The spectrum of accountability programs vary from performance on competency- 
based teacher education to certification regulations, state--wide assessment 
and teacher evaluations, teacher preparation, elimination of community 
needs in educational control, some forms of planned program budgeting, staffing 
innovations, etc. Competency-based teacher education hasi basically been 
implemented since the Atlantic City Representative Assembly, 

The programs vary in quality from a few of high-quality to a great many 
of very low quality. A few programs genuinely seek to improve education, and 
many more seek relief from economic pressures. It is painfully apparent that 
millions of dollars that should be going to the classroom are being spent on 
accountability programs that are not themselves accountable. 

There are many reasons for the misuse of accountability programs, but 
to me three are of prime importance. First, districts are so desperately 
under-financed that programs that should be directed at analyzing and improving 
educational programs are instead utilized for controlling costs, and the 
greatest cost to any district is its teaching staff. 

National priorities are clear enough -- around 7% of the federal budget 
goes to education, while over 50% goes to our military posture. Since we now 
have the equivalent of several hundred pounds of TNT for every pound of human 
flesh and blood on earth, it would seem to be time tc make plans to save 
and expand life not destroy it. 

Secondly, teachers together with the children they serve are the 
non-participating victims of these programs. It is time the future of 
education be guided by those who know about education not those well-intentioned 
amateurs who think they know about education. 

Until teachers function actively in the decision-making of the accountability 
arena, they will continue to be unfamiliar with the vocabulary of accountability, 
and thereby be less effective as debaters, either as part of the decision- 
making or as constructive critics while others make the decisions. 

Third, for every piece of clearly anti-teacher accountability legislation, 
there are several laws and regulations passed with good intentions. As those 
stated intentions are converted to implementation, the components of the 
legislation are adulterated by industrial standards, economic desperation, 
and anti-teacher school boards and administrators. Teachers support the 
good intentions, and then are saddled by bad implementations. 

The request for accountability is imminently reasonable. The question is 
accountability by whom, for what, and how they are to be held accountable. 
If industrialists were as accountable as they demand teachers be, Nader's 
Raiders would be out of business. 
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If the current trend continues, several million copies of Harry Truman*s 
sign, "The Buck Stops Here," should be reproduced one for the desk of every 
student in the country. 

The report of the Denver Accountability Conference developed and 
unanimously adopted by the thirty states in attendance constitutes a basic 
guidebook for NEA activity. It includes the development of positive alternative 
programs and legislation, the disbursement of public and professional information, 
support for states and locals to either combat bad programs or develop good 
ones, the continuous monitoring of accountability programs with appropriate 
responses, and the establishment of teachers as one portion of accountability 
along with legislators, school boards, students, administrators and teacher 
preparation institutions. 

Madam Chairman: as Co-Chairman of the Denver Accountability Conference 
and as Chairman of the Accountability Committee, I move the adoption of 
this report and the following enabling motion: 

Be it moved that the NEA Board of Directors, Executive Committee, 
and Executive Secretary are hereby directed by this Representative 
Assembly to take any steps necessary to mobilize sufficient resources 
of personnel and funds to develop and mount, during the fiscal year 
1973-74, a unifold, comprehensive program in the accountability 
arena. 

Such a program is to draw upon personnel resources from any 
vantage point, national, state or local. It should mobilize and provide 
for coordinated effort involving all levels. 

The report of the Denver accountability conference shall serve 
as the primary basis for the NEA accountability program. 
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